11 research outputs found
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks
In this paper, we present a novel approach to perform deep neural networks
layer-wise weight initialization using Linear Discriminant Analysis (LDA).
Typically, the weights of a deep neural network are initialized with: random
values, greedy layer-wise pre-training (usually as Deep Belief Network or as
auto-encoder) or by re-using the layers from another network (transfer
learning). Hence, many training epochs are needed before meaningful weights are
learned, or a rather similar dataset is required for seeding a fine-tuning of
transfer learning. In this paper, we describe how to turn an LDA into either a
neural layer or a classification layer. We analyze the initialization technique
on historical documents. First, we show that an LDA-based initialization is
quick and leads to a very stable initialization. Furthermore, for the task of
layout analysis at pixel level, we investigate the effectiveness of LDA-based
initialization and show that it outperforms state-of-the-art random weight
initialization methods.Comment: 5 page
DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments
We introduce DeepDIVA: an infrastructure designed to enable quick and
intuitive setup of reproducible experiments with a large range of useful
analysis functionality. Reproducing scientific results can be a frustrating
experience, not only in document image analysis but in machine learning in
general. Using DeepDIVA a researcher can either reproduce a given experiment
with a very limited amount of information or share their own experiments with
others. Moreover, the framework offers a large range of functions, such as
boilerplate code, keeping track of experiments, hyper-parameter optimization,
and visualization of data and results. To demonstrate the effectiveness of this
framework, this paper presents case studies in the area of handwritten document
analysis where researchers benefit from the integrated functionality. DeepDIVA
is implemented in Python and uses the deep learning framework PyTorch. It is
completely open source, and accessible as Web Service through DIVAServices.Comment: Submitted at the 16th International Conference on Frontiers in
Handwriting Recognition (ICFHR), 6 pages, 6 Figure
Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass
In the last decades we have witnessed the success of applications of
Artificial Intelligence to playing games. In this work we address the
challenging field of games with hidden information and card games in
particular. Jass is a very popular card game in Switzerland and is closely
connected with Swiss culture. To the best of our knowledge, performances of
Artificial Intelligence agents in the game of Jass do not outperform top
players yet. Our contribution to the community is two-fold. First, we provide
an overview of the current state-of-the-art of Artificial Intelligence methods
for card games in general. Second, we discuss their application to the use-case
of the Swiss card game Jass. This paper aims to be an entry point for both
seasoned researchers and new practitioners who want to join in the Jass
challenge
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
Improving Reproducible Deep Learning Workflows with DeepDIVA
The field of deep learning is experiencing a trend towards producing
reproducible research. Nevertheless, it is still often a frustrating experience
to reproduce scientific results. This is especially true in the machine
learning community, where it is considered acceptable to have black boxes in
your experiments. We present DeepDIVA, a framework designed to facilitate easy
experimentation and their reproduction. This framework allows researchers to
share their experiments with others, while providing functionality that allows
for easy experimentation, such as: boilerplate code, experiment management,
hyper-parameter optimization, verification of data integrity and visualization
of data and results. Additionally, the code of DeepDIVA is well-documented and
supported by several tutorials that allow a new user to quickly familiarize
themselves with the framework
Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning
In this paper, we tackle the problem of categorizing and identifying cross-depicted historical motifs using recent deep learning techniques, with aim of developing a content-based image retrieval system. As cross-depiction, we understand the problem that the same object can be represented (depicted) in various ways. The objects of interest in this research are watermarks, which are crucial for dating manuscripts. For watermarks, cross-depiction arises due to two reasons: (i) there are many similar representations of the same motif, and (ii) there are several ways of capturing the watermarks, i.e., as the watermarks are not visible on a scan or photograph, the watermarks are typically retrieved via hand tracing, rubbing, or special photographic techniques. This leads to different representations of the same (or similar) objects, making it hard for pattern recognition methods to recognize the watermarks. While this is a simple problem for human experts, computer vision techniques have problems generalizing from the various depiction possibilities. In this paper, we present a study where we use deep neural networks for categorization of watermarks with varying levels of detail. The macro-averaged F1-score on an imbalanced 12 category classification task is 88.3 %, the multi-labelling performance (Jaccard Index) on a 622 label task is 79.5 %. To analyze the usefulness of an image-based system for assisting humanities scholars in cataloguing manuscripts, we also measure the performance of similarity matching on expert-crafted test sets of varying sizes (50 and 1000 watermark samples). A significant outcome is that all relevant results belonging to the same super-class are found by our system (Mean Average Precision of 100%), despite the cross-depicted nature of the motifs. This result has not been achieved in the literature so far
Combining graph edit distance and triplet networks for offline signature verification
Offline signature verification is a challenging pattern recognition task where a writer model is inferred using only a small number of genuine signatures. A combination of complementary writer models can make it more difficult for an attacker to deceive the verification system. In this work, we propose to combine a recent structural approach based on graph edit distance with a statistical approach based on deep triplet networks. The combination of the structural and statistical models achieve significant improvements in performance on four publicly available benchmark datasets, highlighting their complementary perspectives